Hackers & Painters: Big Ideas From the Computer Age by Paul Graham

Hackers & Painters: Big Ideas From the Computer Age by Paul Graham

Author:Paul Graham
Language: eng
Format: mobi, pdf
Tags: Web, Computers, Linux, Social Aspects, Information Theory, Programming, General, Software Development & Engineering, Algorithms, Operating Systems, Internet, Design, Information Technology
ISBN: 9781449389550
Publisher: O'Reilly Media
Published: 2004-12-15T05:00:00+00:00


where w is the token whose probability we're calculating, good and bad are the hash tables I created in the first step, and G and B are the number of non spam and spam messages respectively.

I want to bias the probabilities slightly to avoid false positives, and by trial and error I've found that a good way to do it is to double all the numbers in good. This helps to distinguish between words that occasionally do occur in legitimate email and words that almost never do. I only consider words that occur more than five times in total (actually, because of the doubling, occurring three times in non spam mail would be enough). And then there is the question of what probability to assign to words that occur in one corpus but not the other. Again by trial and error I chose .01 and .99. There may be room for tuning here, but as the corpus grows such tuning will happen automatically anyway.

The especially observant will notice that while I consider each corpus to be a single long stream of text for purposes of counting occurrences, I use the number of emails in each, rather than their combined length, as the divisor in calculating spam probabilities. This adds another slight bias to protect against false positives.

When new mail arrives, it is scanned into tokens, and the most interesting fifteen tokens, where interesting is measured by how far their spam probability is from a neutral .5, are used to calculate the probability that the mail is spam. If w1, . . . , w15 are the fifteen most interesting tokens, you calculate the combined probability thus:



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
Deep Learning with Python by François Chollet(12587)
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7787)
Grails in Action by Glen Smith Peter Ledbrook(7704)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6422)
Kotlin in Action by Dmitry Jemerov(5072)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3832)
Mastering Azure Security by Mustafa Toroman and Tom Janetscheck(3337)
Learning React: Functional Web Development with React and Redux by Banks Alex & Porcello Eve(3089)
Mastering Bitcoin: Programming the Open Blockchain by Andreas M. Antonopoulos(2873)
The Art Of Deception by Kevin Mitnick(2611)
Drugs Unlimited by Mike Power(2473)
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution by Walter Isaacson(2332)
Kali Linux - An Ethical Hacker's Cookbook: End-to-end penetration testing solutions by Sharma Himanshu(2316)
Writing for the Web: Creating Compelling Web Content Using Words, Pictures and Sound (Eva Spring's Library) by Lynda Felder(2264)
A Blueprint for Production-Ready Web Applications: Leverage industry best practices to create complete web apps with Python, TypeScript, and AWS by Dr. Philip Jones(2229)
SEO 2018: Learn search engine optimization with smart internet marketing strategies by Adam Clarke(2195)
JavaScript by Example by S Dani Akash(2138)
DarkMarket by Misha Glenny(2086)
Wireless Hacking 101 by Karina Astudillo(2079)
Full-Stack React Projects by Shama Hoque(1992)